Data Allocation Strategies for Dense Linear Algebra Kernels on Heterogeneous Two-dimensional Grids

نویسندگان

  • Olivier Beaumont
  • Vincent Boudet
  • Fabrice Rastello
  • Yves Robert
چکیده

We study the implementation of dense linear algebra computations, such as matrix multiplication and linear system solvers, on two-dimensional (2D) grids of heterogeneous processors. For these operations, 2D-grids are the key to scalability and eÆciency. The uniform block-cyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these operations on heterogeneous grids to the speed of the slowest processor. We present and study more sophisticated data allocation strategies that balance the load on heterogeneous 2D-grids with respect to the performance of the processors. The practical usefulness of these strategies is fully demonstrated by experimental data for a heterogeneous network of workstations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers)

ÐIn this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication or linear system solvers, on heterogeneous networks of workstations. The uniform block-cyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these linear algebra kernels on heterogeneous grids to the speed of the slowest processor...

متن کامل

Data parallel scheduling of Operations in Linear Algebra on heterogeneous clusters

The aim of data and task parallel scheduling for dense linear algebra kernels is to minimize the processing time of an application composed by several linear algebra kernels. The scheduling strategy presented here combines the task parallelism used when scheduling independent tasks and the data parallelism used for linear algebra kernels. This problem has been studied for scheduling independent...

متن کامل

Exposing Inner Kernels and Block Storage for Fast Parallel Dense Linear Algebra Codes⋆

Efficient execution on processors with multiple cores requires the exploitation of parallelism within the processor. For many dense linear algebra codes this, in turn, requires the efficient execution of codes which operate on relatively small matrices. Efficient implementations of dense Basic Linear Algebra Subroutines exist (BLAS libraries). However, calls to BLAS libraries introduce large ov...

متن کامل

Accelerating GPU Kernels for Dense Linear Algebra

Implementations of the Basic Linear Algebra Subprograms (BLAS) interface are major building block of dense linear algebra (DLA) libraries, and therefore have to be highly optimized. We present some techniques and implementations that significantly accelerate the corresponding routines from currently available libraries for GPUs. In particular, Pointer Redirecting – a set of GPU specific optimiz...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000